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^ HETEROLOGOUS PROTEIN EXPRESSION 

All documents cited herein are incorporated by reference in their entirety. 



TECHNICAL FIELD 

This invention is in the field of the recombinant expression of proteins in heterologous hosts. 

5 BACKGROUND ART 

Recombinant expression of proteins is of huge importance. For convenience, bacterial hosts such as 
E.coli are typically used. Where bacterial hosts are unsuitable (e.g. where protein glycosylation or 
other modifications are desired, or where proteins are not expressed for one reason or another) it is 
common to choose a yeast host, a baculovirus host, or perhaps a cell line derived from a higher 
10 eukaryote, such as a CHO cell line. Plants are also used as recombinant expression hosts. 

Although recombinant protein expression is often routine, with off-the-shelf kits being available for 
general use, many proteins cannot easily be expressed in this way. Bacterial hosts often give 
insoluble proteins which must be purified and re-folded from inclusion bodies, and do not offer 
eukaryotic post translational modifications. Yeasts (including Saccharomyces) grow poorly when 

15 minimal media are required by the selection systems that are commonly used, and Pichia systems [1] 
are generally useful only for secreted proteins. The baculovirus and CHO systems are cumbersome 
and expensive, and do not store well by freezing. Plant systems are at an early stage and extensive 
post-expression processing is required. Moreover, transformed hosts are typically unstable such that 
it is constantly necessary to impose selective conditions to prevent reversion to a non-transformed 

20 state e.g. by loss of expression plasmids, etc. For these reasons, hosts such as Saccharomyces are 
seen as poor choices for general recombinant expression. 

Thus there remains a need for an expression system which avoids the need for expensive reagents, 
which is genetically stable, which can be frozen well, which can grow quickly and abundantly, and 
which can produce eukaryotic proteins in a soluble and active form. It is an object of the invention to 
25 provide an improved expression system to address these needs. 

DISCLOSURE OF THE INVENTION 

The invention is based on the use of a new class of selection marker in expression vectors. 

Selection markers used in prior art systems are often based on including a resistance gene in the 
vector e.g. an antibiotic resistance gene (e.g. ampicillin resistance, ampR), a drug resistance gene 
30 (e.g. neomycin resistance), a herbicide resistance gene (e.g. glyphosate resistance), the HPRT/HAT 
system, etc. When used with a host that is naturally sensitive to the factor in question, the resistance . 
genes mean that only transformed cells can survive in a medium containing the factor. 
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| Other selection markers are based on auxotrophic hosts i.e. those which require a particular facAn 
order to survive. Auxotrophic host systems are by far the most commonly used for yeasts [2], usSty 
using URA3 (for uracil auxotrophs), LEU2 (for leucine auxotrophs), TRP1 (for tryptophan 
auxotrophs) or HISS (for histidine auxotrophs) to complement the mutations in the auxotrophic host 
and confer prototrophy. The hosts can grow in rich medium, but growth in a medium lacking an 
essential factor (e.g. lacking leucine) leads to cell death. Inclusion of a survival gene (e.g. the 
2-isopropyl malate dehydrogenase encoded by LEU2) on a plasmid ensures that growth in the 
appropriate minimal medium selects only transformants. On transfer to a rich medium, where 
selection pressure is absent, auxotrophic hosts tend to lose plasmids encoding the selection markers. 
These prior art selection systems are based on using a growth medium in which only transformants 
can survive, either by including the lethal factor (transformants are resistant) or by omitting the 
essential factor (transformants are not auxotrophic). The markers are thus conditional, as the 
selection pressure applies only under certain conditions. In contrast, the selection markers used 
according to the present invention are non-conditional i.e. the selection pressure is absolute. The 
markers involved are genes which encode essential survival factors, and loss of the marker gene (e.g. 
by loss of the expression vector) is lethal. By avoiding resistance markers, lethal factors (eg. 
antibiotics) do not have to be added to culture media, thus simplifying the culture process, reducing 
costs and avoiding contamination of the expressed protein. By avoiding auxotrophic hosts, cells can 
be grown in rich media rather than in minimal media, thereby giving much better growth rates. 

Thus the invention provides a cell that expresses both chromosomal genes and extra-chromosomal 
genes, wherein (a) the expressed extra-chromosomal genes include a gene with an essential function, 
the expression of which is unconditionally required for survival of the cell, (b) the expressed 
chromosomal genes do not provide that essential function, and (c) the extra-chromosomal genes 
include a heterologous gene, the expression of which is controlled by a promoter that is functional in 
25 the cell. Loss of the extra-chromosomal essential gene is lethal to the cell. 

The invention also provides a method for expressing a heterologous gene, comprising the step of 
growing a cell of the invention in a culture medium. The invention also provides a method for 
purifying a protein, comprising the steps of: (a) growing a cell of the invention such that it expresses 
said protein; and (b) purifying the protein. The method may involve the step of: (c) treating the 
protein with a protease to provide a cleavage product of interest, and this step (c) may follow step (b) 
or may be an intrinsic part of step (b). 

The cell of the invention can be constructed in two steps, as illustrated for yeast in Figure 6 and as 
described below. The invention uses a starting cell that expresses both chromosomal genes and 
extra-chromosomal genes, wherein (a) the expressed extra-chromosomal genes include a gene with 
an essential function, the expression of which is unconditionally required for survival of the cell, 
(b)the expressed chromosomal genes do not provide that essential function, and (c) the' 
extra-chromosomal genes include a conditionally-lethal gene. The invention also provides an 
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^Intermediate cell which expresses chromosomal genes, a first set of extra-chromosomal genes and a 
second set of extra-chromosomal genes, wherein (a) the expressed first and second sets of 
extra-chromosomal genes both include a gene with the same essential function, the expression of 
which is unconditionally required for survival of the cell, (b) the expressed chromosomal genes do 
5 not provide that essential function, (c) the first set of extra-chromosomal genes includes a 
conditionally-lethal gene, and (d) the second set of extra-chromosomal genes includes both a 
conditionally-required gene and a heterologous gene. 

The invention also provides an extra-chromosomal vector, comprising: (a) an essential gene whose 
expression is unconditionally required for survival of a cell of interest; (b) a conditionally-required 
10 gene to allow selection of host cells which include the extra-chromosomal vector; and (c) a gene 
encoding a heterologous protein of interest operably linked to a promoter that is functional in the cell 
of interest. 

The invention also provides a method for preparing a cell of the invention, comprising the steps of: 
(a) obtaining a starting cell, which expresses a conditionally-lethal gene; (b) transforming the starting 
15 cell with an extra-chromosomal vector of the invention; (c) selecting transformants which express the 
vector's conditionally-required gene; and then (d) selecting transformants which lose the 
conditionally-lethal gene. 

The invention alternatively provides a cell which expresses chromosomal genes and 
extra-chromosomal genes, wherein (a) the expressed extra-chromosomal genes include an essential 
20 gene whose expression is unconditionally required for survival of the cell, (b) the expressed 
chromosomal genes do not include said essential gene, and (c) the extra-chromosomal genes include 
a heterologous gene, the expression of which is controlled by a promoter that is functional in the cell. 

Essential genes 

The invention is based on the use of genes with essential functions as selection markers. Vectors 
25 encoding heterologous products of interest also encode the essential gene. As loss of the essential 
function is unconditionally lethal, the selection pressure for cells which contain the vector is absolute 
Le. surviving cells must contain the vector with both the essential gene and the heterologous gene. 

The essential gene can be any gene whose loss prevents the growth of cells e.g. the loss prevents cell 
division, prevents mitosis, prevents transcription, prevents translation, or prevents any other 
30 metabolic process which is essential for survival in culture. A gene is not an "essential gene" if its 
expression is required for survival only under certain conditions e.g. ompR is essential in the 
presence of ampicillin, but it is not essential under other circumstances, and so ampR is not an 
"essential gene" — its loss is not unconditionally lethal, as a change in growth conditions cannot 
compensate for the loss of an "essential gene". 

35 The identification of essential genes is straightforward e.g. using knockout studies, etc. Reference 3 
lists various essential genes in E.coli, including some which are only conditionally-lethal, and the 
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profile of the E.coli chromosome in reference 4 classifies genes as non-essential or essAl 
Reference 5 lists various essential genes for yeast, and the EUROSCARF [6] and EUROFAN^S] 
projects have also identified essential genes in yeast. EUROFAN defines an essential gene as one 
winch is "imperative for the vegetative life cycle of a yeast cell grown on rich YPD media at 30°C» 
and estimated that 16-18% of yeast genes were essential on the basis that "a strain deleted for such a 
gene cannot grow on YPD at 30°C". As well as these functional studies, genomics (particularly 
comparative genomics) is often used to identify essential genes [9], and has been applied to Ecoli 
yeasts, Mycobacterium tuberculosis [10], etc. A further approach to identifying essential genes is 
given m reference 1 1. The DEG "database of essential genes" [12,13] is a further source. The skilled 
person is thus readily able to identify various genes whose absence cannot be tolerated by a host. 
The essential gene is preferably short e.g. with a coding sequence (start codon to stop codon 
inclusive) of <3000 base pairs (e.g. <2500 bp, <2000 bp, <1500 bp, <1250 bp, <1000 bp, or shorter) 
The use of short genes is preferred because it reduces the potential for duplication of restriction sites 
within a vector. If restriction sites are duplicated, however, then codons can be changed to remove 
the recognition sequence without changing the encoded amino acid(s) or, as an alternative, the vector 
may be equipped for Iigase independent cloning (LIC) as described below. 

One advantage of the invention is that high copy numbers of the heterologous gene can be obtained 
and this is accompanied by hyper-expression of the essential gene. Thus the essential gene is 
preferably not lethal when hyper-expressed. To achieve maximum copy number, it is preferred that 
the essential gene should be required by the host at high levels. 

Preferred essential genes include those which encode polypeptides with (a) a molecular weight of 
less than about 40kDa (e.g. <30kDa, <20kDa, or <10kDa), and/or (b) reasonable cellular abundance 
as md.cated by their codon adaptation indices (CAI [14]) of more than about 0.3. Genes which 
satisfy these criteria in yeast include: CDC33, COF1, EFB1, ERG25, FBA1, GPM1 GSP1 GUK1 
HEM13, HSP10, IPP1, NHP2, NOP1, NOP10, NTF2, PFY1, PSA1, RLP24, RPB10, RPC10 RPL5 
RPL10, RPL15A, RPL17A, RPL18A, RPL25, RPL28, RPL30, RPL32, RPL33A, RPL43A RPP0 
RPS2, RPS3, RPS5, RPS13, RPS15, RPS20, RPS31, SARI, SEC14, SMT3, SNU13, SSS1 SUI2 
TIFll, TPI1, VRG4, and YRB1. ' ' 

Preferred essential genes include those involved in cell cycle control and/or involved in mitosis. 
A preferred essential gene for use with the invention is MOB1, whose expression is absolutely 
required for completion of mitosis and maintenance of ploidy in yeast [15]. The yeast gene is less 
than 750 bp in length, and hyper-expression of the encoded Mobl protein is tolerated. 

Another preferred essential gene for use with the invention is Cdc33 (also known as eIF4E) which 
recognises the 7-methylguanosine-containing cap of mRNA in the first step of mRNA recruitment 
for translation. The Cdc33 protein has 212 aa in yeast and is abundant as judged by direct assays and 
by its CAI index of 0.387. Furthermore, as CDC33 is a translation factor then increased expression 
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^^vels caused by copy number amplification may have a beneficial effect on heterologous protein 
expression. Over-expression of CDC33 can cause slow growth but this effect can be overcome in a 
Acln3 oxAcln2 background [16] and should not matter anyway over a typical 4-8 hr induction period. 

Another preferred essential gene for use with the invention is HsplO, which is a lOkDa mitochondrial 
5 chaperonin in yeast (homologue of E.coli GroES) that regulates the Hsp60 chaperonin [17]. HsplO is 
involved in protein folding and sorting in mitochondria. 

Other essential genes for use with the invention can be identified empirically e.g. by the use of 
chromosomal knockout techniques to identify lethal knockout mutations, combined with a test for 
whether the lethal effect can be reversed by supplying a copy of the knocked-out gene on a plasmid. 

10 In cells of the invention, the essential gene is expressed from an extra-chromosomal element rather 
than from a chromosomal site. Loss of the extra-chromosomal gene results in death of the cell. 

The use of an essential gene makes the system inherently stable and so is preferable to the use of a 
resistance gene for several reasons. For instance: the need for minimal selective media is avoided, 
thus giving higher growth rates; there is no risk of the final product being contaminated by the 
1 5 resistance molecule e.g. antibiotic contamination; and, for cells such as yeasts, the need for expensive 
anti-microbials is avoided. 

As the invention utilises genes that are essential, the absence of that gene from a host's 
chromosome(s) means that a functional copy of the gene has been lost from the chromosome, to be 
replaced by the extra-chromosomal gene. It will be understood that the replacement gene need not be 

20 precisely the same as the gene which has been lost. Tolerable differences include point mutations that 
change the gene's sequence without changing the encoded amino acid sequence, point mutations that 
change the encoded amino acid sequence without functional consequence, the addition of fusion 
sequences, and the use of a gene that is different from the lost chromosomal copy (e.g. from a 
different species, or even a different type of organism) but which is functionally able to complement 

25 that loss. Taking S.cerevisiae as an example, therefore, the host could lack an essential gene which is 
complemented by the corresponding gene from S.pombe or from any other eukaryote. The use of a 
non-identical gene which is less efficient than the native chromosomal gene can further enhance copy 
number amplification, as described below. However, the use of extra-chromosomal genes which are 
the same as those found wild-type in the host organism's chromosome is not excluded. 

3 0 Preparing the cell 

Cells of the invention have lost an essential gene on their chromosome(s), but complement that loss 
using an extra-chromosomal copy of the gene. As loss of an essential gene cannot be tolerated, it is 
not feasible to make cells of the invention simply by deleting the chromosomal copy and then 
transforming the mutant cells with a vector encoding the gene, because death means that there is no 
35 way of selecting for cells which lack the essential gene. Instead, cells of the invention can be 
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|prepared by means of "plasmid shuffling" [18], involving a transitional stage where cells posse^he 
essential gene in two separate extra-chromosomal forms (e.g. see Figure 6). ^ 

The overall shuffling process begins with a mutant cell that lacks a chromosomal copy of an essential 
gene, but which possesses a replacement copy on a first vector, which vector also contains a 
conditionally-lethal marker. A "second vector of the invention (carrying (a) a further replacement 
essential gene, (b) a conditionally-essential marker, and (c) a heterologous gene) is then used, and 
transformants are selected on the basis of the vector's conditionally-selective marker. At this stage 
the cell contains two extra-chromosomal copies of the essential gene, one on a first vector which 
contains a negative selection marker and one on a second vector which contains a positive selection 
marker and a heterologous gene. Loss of either vector leads to retention of the essential gene, but 
only the second vector is useful for heterologous protein expression. Thus the process then proceeds 
to eliminate cells which retain the first vector, thereby selecting cells which possess only the second 
vector. This final selection uses the first vector's conditionally-lethal marker, to yield cells in which 
the essential gene and the heterologous gene are encoded by the same vector. The overall effect of 
this process, therefore, is to replace the first vector with the second vector. Cells which lose both 
vectors lose the essential gene and thus die. 

The invention can be performed much more quickly than existing eukaryotic expression systems, 
such as Pichia and baculovirus, and essentially as quickly as with advanced bacterial expression 
systems. Once the desired DNA fragment is cloned into the plasmid of the invention, a yeast host 
20 expressing high levels of the protein can be prepared in less than two weeks. 

Overall, the shuffling process involves: (a) a host cell with an inactive chromosomal essential gene, 
complemented by a 'covering' plasmid which supplies the essential gene and contains a 
counterselection marker; and (b) an expression plasmid which also supplies an essential gene and 
contains the heterologous gene of interest (usually under the control of a repressive promoter) plus a 
selection marker. The shuffling protocol swaps the two plasmids without going via a stage where the 
extra-chromosomal essential gene is lost. 

In S.cerevisiae a covering plasmid will generally include the URA3 counterselection marker, the 
expression plasmid will include a selection marker (e.g. auxotrophic marker), and the expression of 
the heterologous product will be controlled by galactose repression of GAL1-10. The URA3 marker 
advantageously allows selection of starting cells which contain the covering plasmid and also, using 
FOA, allows counterselection of intermediate cells. Similar considerations apply in S.pombe, 
although the heterologous product may be controlled by thiamine repression of the nmtl promoter. 

In Kcoli and other applicable bacteria a covering plasmid may include the sacB gene from B.subtilis. 
This gene prevents growth on sucrose, permitting counterselection. Unlike URA3 the sacB gene does 
not also allow a positive selection and so the covering plasmid will also include a marker such as 
kan R for selecting suitable starting cells. 
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fc s an alternative to the sacB system, the rpsL system can be used. Cells carrying the wild type rpsL 
■ S tr- S ) are sensitive to streptomycin, but many rpsL mutations give streptomycin resistance (Stf ). 
If a cell has both ST and Str~ genes, however, they remain sensitive to streptomycin. A covering 
plasmid can thus contain wild-type rpsL and kan«. Using a Str™ starting cell and an expression 
plasmid with amp R the intermediate cells can be selected based on ampicillin resistance. Loss of the 
covering plasmid can then be selected based on streptomycin resistance. 
The combined use of the sacB and strA systems in Kcoli is described in reference 19. 
The invention uses a starting cell which expresses chromosomal genes and extra-chromosomal 
genes wherein (a) the expressed extra-chromosomal genes include an essential gene whose 
expression is unconditionally required for survival of the cell, (b) the expressed chromosomal genes 
do not include said essential gene, and (c) the extra-chromosomal genes include a 
conditionally-lethal gene. Suitable starting cells have been described in the art for various essential 
genes [e.g. 20,21]. The invention provides a starting cell, characterised in that (i) the cell is a 
S.cerevisiae yeast, and (ii) the essential gene is MOB1, Cdc33 or HsplO. 

As an alternative to using a plasmid shuffling approach, it is possible to prepare cells of the invention 
from diploid cells that are hetero-allelic for an essential gene i.e. cells that contain a diploid genome 
but which express a functional form of the essential gene from only one haploid set of chromosomes. 
The hetero-allelic cell is transformed with a plasmid encoding both the essential gene and the 
heterologous gene of interest and, after sporulation, haploids lacking a functional chromosomal gene 
20 are selected [22]. This technique is more complicated than plasmid shuffling, but may be preferred if 
there is frequent recombination between chromosomes and shuffling plasmids. 

Extra-chromosomal genes and vectors 

Cells of the invention include extra-chromosomal genes, which are located on an extra-chromosomal 
vector Such vectors do not include DNA of the mitochondria, chloroplasts or kinetoplasts (where 
25 applicable). Preferred vectors are capable of autonomous replication i.e. their copy number can 
exceed the copy number of the host cell's own chromosome(s). Preferred vectors are non-integrating 
(unlike the situation with prior art Pichia systems). The extra-chromosomal genes will generally be 
found on a plasmid or in a viral vector. 

Plasmids of the invention include an essential gene, such that (a) the plasmid can complement the 
30 lack of that gene in a host's chromosome, and (b) loss of the plasmid is lethal to the cell. 

Plasmids of the invention also include a heterologous gene. 

Plasmids of the invention will usually also include a conditionally-required gene. This gene is not 
required for survival of a cell of the invention, but may be used during the cell's preparation (see 
below) Conditionally-required genes allow transformants to be selected under appropriate selective 
growth conditions, and may confer resistance to an otherwise-toxic substance (e.g. an antibmtic 

-7- 



35 



10 



15 



^resistance gene, such as ampR, kanR, tetR, hyg, etc.; a drug resistance gene, such as aad, ble,^, 
hpt, nptll, aphll gat, pac, neoR, etc.; a herbicide resistance gene, such as bar, pat, csrl-1, 
epsp, etc.; and other resistance genes, such as Me, bsd, gpt, hisD, trpB, hprt, tk) or treatment {e.g. 
irradiation, mutagenesis), or may complement an auxotrophic mutation in the host's chromosome 
{e.g. the URA3, LEU2, TRP1, HISS, LYS2, ADE2, ADE3 genes; etc.). A preferred conditionally, 
reqmred gene is TRP1, which can be used to select yeast transformants on the basis of growth in a 
Trp-free medium. 

Other plasmids used in preparing host cells of the invention {e.g. plasmids used to prepare starting 
cells, and retained in intermediate cells of the invention) include the same essential gene as described 
above, but include a conditionally-lethal gene for counterselection. Cells containing these plasmids 
can thus be selectively killed. Typical conditionally-lethal genes encode proteins which convert 
non-toxic substances into toxic substances, and examples include, but are not limited to: URA3 
(lethal in the presence of 5-fluororotic acid, FOA); LYS2 (lethal in the presence of a-aminoadipic 
acid as the primary nitrogen source); CAN1 (lethal in the presence of canavanine and absence of 
arginine); CYH2 (lethal in the presence of cycloheximide); Tk or thymidine kinase (lethal in the 
presence of ganciclovir or acyclovir); Cd or cytosine deaminase (lethal in the presence of 
5-fiuorocytosine); Ntr or nitroreductase (lethal in the presence of CB1954); sacB from B.subtilis 
(lethal in the presence of sucrose); rpsL and mutant rpsL (selection based on streptomycin sensitivity/ 
resistance); etc. 

Some conditionally-required genes (for "positive selection") can also be used as conditionally-lethal 
genes (for "negative selection"), depending on growth conditions. For example, URA3 is a 
conditionally-required gene for uracil auxotrophs, but it is lethal when growth occurs in the presence 
of FOA. Similarly, thymidine kinase offers a salvage pathway in the presence of aminopterin, but is 
lethal in the presence of acyclovir. Where a process of the invention uses both a conditionally- 
required gene and a conditionally-lethal gene, however, different genes will usually be used. 

As well as (a) the essential gene, (b) the conditionally-required gene, and (c) the optional 
heterologous gene, plasmids of the invention will typically include one or more of the following 
elements: (i) an origin of replication functional in a host cell of interest {e.g. functional in yeast, such 
as an arsl element or, more preferably, a 2u ori element); (ii) a polylinker or multi-cloning site 
containing a plurality {e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) of restriction sites in the same or, 
preferably, in different reading frames e.g. see Figure 4; (in) a transcription termination sequence 
{e.g. T-ADH1, T-CYC1, etc.) and/or additional stop codons (TGA, TAA and/or TAG) downstream of 
one or more (preferably all) of the promoters and their coding sequences in the plasmid; and (iv) a 
stabilising sequence, such as stb. Transcription termination sequences can be included as part of a 
35 heterologous insertion rather than as part of a starting vector. 

To function as a shuttle vector between eukaryotes and bacteria, thereby simplifying preparative 
work, the plasmid may also include one or more of: (v) an origin of replication functional in bacteria, 
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^fP°h as the ColEl origin of replication; and (vi) an antibiotic resistance marker suitable for selection 
of bacterial transformants. As an alternative to using bacteria for preparative work, gap repair cloning 
[23] can be used. 

Where a vector is for bacterial expression and is used in a shuffling procedure, an intermediate cell of 
5 the invention will include both a covering plasmid and an expression plasmid. The origins of 
replication in these plasmids should be of different compatibility groups to ensure that they can 
occupy the same cell during shuffling {e.g. one ColEl -based plasmid and one P15A-based plasmid). 

Heterologous genes 

Plasmids used in cells of the invention, and in intermediate cells, include a heterologous gene i.e. a 
10 gene not naturally expressed in the organism in which the plasmid is propagated. Transcription of the 
heterologous gene will generally be under the control of a promoter that is functional in the host cell, 
as expression of the gene cannot be achieved using a promoter that is inactive in the cell. 

The heterologous gene preferably comprises a coding sequence from a eukaryote, more preferably 
from a higher eukaryote. For example, the heterologous gene may comprise an animal sequence e.g. 
15 from a mammal, such as a human sequence. As an alternative, the heterologous gene may comprise a 
coding sequence from a virus (preferably a eukaryotic virus), a parasite, a pathogenic bacterium, etc. 

Various types of heterologous genes can be used: (a) one type of heterologous gene is a sequence 
which encodes a polypeptide that is useful during protein purification, and to which a farther 
sequence of interest may be fused to give fusion polypeptides; (b) a second type of heterologous gene 

20 is a sequence which encodes a fusion polypeptide, comprising a sequence useful during protein 
purification, fused to a further sequence of interest; (c) a third type of heterologous gene is a 
sequence of interest without any fusion sequence. Fusion expression (b) of a protein of interest is 
typical, but direct expression (c) is also useful. A gene sequence useful during protein expression (a) 
will not typically be expressed as a protein for its own sake but will be used as a starting material for 

25 preparing a fusion construct (b). 

Polypeptides commonly used as fusion partners to assist in purification include, but are not limited 
to: glutathione-S-transferase (GST), purified using immobilised glutathione [24]; poly-histidine tags, 
purified by IMAC [25]; calmodulin-binding peptide (CBP), purified using immobilised calmodulin; 
maltose-binding protein (MBP), purified using immobilised amylose; a chitin-binding domain 

30 (CBD), purified by binding to chitin; secretory signals; and the Flag epitope (DYKDDDDK) [26], 
haemagglutinin epitope (YPYDVPDYA, HA-tag), VSV-G epitope, thioredoxin or c-myc epitope 
(EQKLISEEDL), purified by specific immunoaffinity chromatography. Thus a plasmid of the 
invention may include a sequence that encodes one of these polypeptides, optionally fused to a 
further sequence of interest. These two elements may be arranged in either order, N-terminus to 

35 C-terminus, but it is typical referred to have the further sequence downstream of (i.e. fused to the 
C-terminus of) the purification sequence. 
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|The ability to express proteins as GST-fusions is an advantage over Pichia systems, as GST-fu^s 
in Pichia typically fail to bind to immobilised glutathione. The ability to use poly-histidine tagPs 
also an advantage over Pichia, where alcohol dehydrogenase protein co-purifies on IMAC columns. 
The invention avoids these difficulties. 

Where the heterologous sequence is designed for fusing to further sequences, or where it is fused to a 
further sequence, it is typical to include a protease recognition sequence at the junction between the 
two (i.e. at or near the 3' or 5' end of the heterologous sequence). A protease can then be used to 
generate the protein of interest without its purification tag. The proteolytic cleavage can take place 
after purification of the fusion protein or, to simplify purification, can take place while the fusion 
protein is immobilised on an affinity column, allowing the cleaved protein of interest to elute while 
the purification tag remains immobilised. Protease recognition sites include, but are not limited to: 
VPR/GS (Thrombin); IEGR (Factor Xa Protease); DDDDK (Enterokinase); ENLYFQ/G 
(endopeptidase rTEV from tobacco etch virus); and LEVLFQ/GP (human rhinovirus protease 3C). 
As an alternative to using a protease recognition sequence, a self-cleaving protein can be constructed 
1 5 based on inteins [27,28] . 

Prior to use with the invention, the heterologous gene will be prepared in a form suitable for insertion 
into a vector of the invention. This may be by digestion of nucleic acid containing the gene, using 
enzymes that are compatible with the insertion site in the vector of the invention, or by inclusion of 
addition of suitable sequences during preparation e.g. by PCR amplification. 



The insert may be suitable for ligase independent cloning ('LIC [29-31]). For example, the 5' and 3 1 
regions of the insert may have long (e.g. >15 nucleotides) high level of sequence identity to the ends 
of the linearised vector (usually long sticky ends), thereby facilitating insertion of the sequence into 
the vector without needing ligase (or phosphatase). 

The insert sequence may be directly from a natural gene, or may have been modified in some way 
25 e.g. to remove introns, to change codon usage, to introduce or remove restriction sites, etc. 

The invention has been found to be particularly suitable for expression of proteins which have been 
difficult to express in existing systems. Ltel (low temperature essential) [32] is a large yeast protein 
(>1400 amino acids) which cannot be expressed in E.coli, but using the invention is has been 
successfully expressed in soluble form as a GST-fusion (in both directions, N-terminus to 

30 C-terminus). Thus the heterologous gene may encode a protein with 300 or more amino acids (e.g. 
350, 400, 450, 500, 600, 700, 800, 900, 1000 or more), although expression of proteins shorter than 
300 amino acids (e.g. 200 or fewer amino acids) is not excluded. Yeast proteins Bfal and Bub2 are 
found naturally at low levels and were subject to considerable degradation in E.coli expression 
systems [33], but have now been expressed at high levels in soluble form as GST-fusions. Expression 

35 of yeast kinases CDC5, CDC 15 and CDC28 in E.coli gives inactive proteins, but these three proteins 
have been expressed in active soluble form as GST-fusions in yeasts having chromosomal deletions 
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^fjj^ * e proteins. Mammalian proteins such as Tpl2 have also been successfully expressed as 
GST-fusions. Some of these proteins have subsequently been prepared in pure form after thrombin 
cleavage to remove the GST moiety. Likewise, soluble SARS virus Nspl3 gene product, a putative 
mRNA Capl methyl transferase, has been expressed and cleaved from the GST affinity purification 
5 tag using human rhinovirus protease 3C. 

Thus the heterologous gene is preferably expressed as a soluble protein, even in fusion form. The 
production of soluble proteins is an advantage when compared to bacterial expression systems. 

Following expression according to the invention, proteins may adopt their native dimeric form in 
solution. Thus the heterologous gene may encode a protein which naturally forms an oligomer, such 
10 as a dimer, trimer, tetramer, pentamer, hexamer, etc. 

For hetero-oligomeric proteins, it is possible to express multiple heterologous genes from the same 
plasmid, but it is preferred to use one plasmid per heterologous gene, in which case the invention 
generally uses one essential gene per monomer i.e. the chromosome of a host for expressing a 
hetero-dimer will have two inactive essential genes, with their functions being complemented by 
15 different plasmids. Stoichiometric expression can be achieved if the same promoter is used for each 
monomer, provided that the plasmids' copy numbers are the same. 

The heterologous gene is generally different from the essential gene. 
Control of gene expression 

Plasmids for use with the invention include (a) an essential gene, and (b) a conditionally-required 
20 gene and/or a conditionally-lethal gene. For expression purposes, plasmids of the invention also 
include a heterologous gene. Expression of these genes is controlled by upstream promoters. Various 
promoters may be used, but the invention offers better expression if particular promoters are used. 

The essential gene is preferably under the control of a repressible promoter. To increase expression 
levels, the invention exploits the background level of "leaky" expression driven by such promoters 
25 even when they are turned "off e.g. by catabolite repression. As the essential gene is required for the 
host cell to survive, but the host cell does not have a copy of the essential gene on its own 
chromosome, there is a selective pressure to increase the plasmid' s copy number. As the copy 
number increases, the overall expression of the essential gene increases such that the combined 
background expression is adequate for survival. 

30 By repressing expression of the essential gene, therefore, the invention can achieve a high copy 
number of the plasmid. An increase in copy number also gives increased levels of the heterologous 
gene, thereby improving expression levels of the protein of interest. The process of the invention 
may thus include a step of increasing the copy number of a vector to at least 5 (e.g. to at least 1 0, 20, 
30, 40, 50 or more). The use of "leaky" low level expression to increase copy number is known [34]. 



20 



25 



30 



(5 



0Copy number amplification can be further enhanced by using codons in the essential gene whi Are 
non-optimal for the host in question. Where further enhancement of this type is not reqWd " 
however, the essential gene may be modified for optimum codon usage. 

The heterologous gene is preferably under the control of a promoter that is both repressive and 
inducible. Rather than being used to increase copy number, however, this promoter is used to allow 
controlled expression of the protein of interest. When there is an increase in copy number of the 
Plasmid, high levels of heterologous protein expression are achieved. It is thus useful to avoid 
expression of the heterologous gene until a desired time to avoid possible toxic effects of 
over-expression. For example, if Bfal or Clb6 is over-expressed then cells die. Thus the heterologous 
gene may encode a protein that is potentially toxic to the host during normal growth. 
A ty p ic a] repressible promoter system for ^ w . th invent . on . s baged Qn ^ ^ ^ 

of Gall galactokinase I and GallO UDP-glucose 4 epimerase. These are tightly repressed by glucose 
but highly activated when galactose is the sole carbon source. In S.cerevisiase, the dual GAL1 and 
GAL10 promoters are juxtaposed in nature (within the P GAL1 element) and are transcribed in opposite 
directions, and this arrangement of promoters conveniently allows divergent repression of the 
essential gene (controlled by one of the pair, in one direction) and the heterologous gene (controlled 
by the other member of the pair, in the other direction) [35]. 

Other repressible promoters include, but are not limited to: the repressible acid phosphatase gene 
promoter (PHG5), which is activated at low inorganic phosphate levels [36,37]; the thiamine- 
repressible promoter (from nmtl), which is repressed by thiamine [38,39]; the metallothionein 
promoter from MTTl), which is induced by Co* [40]; the copper transport protein promoter (from 
CTR3) which is repressed in the presence of copper ions [41]; a light-switchable system involving a 
DNA-bmdmg domain fused to phytochrome, a transcription activation domain fused to PIF3 grown 
in a medium containing phycocyanobilin, with red light being an activator and far-red light 'being a 
repressor [42]. In bacteria the IPTG-inducible lac promoter can be used. 

The heterologous gene and the essential gene may be controlled by separate copies of the same 
promote, Expression of the two genes is thus controlled together, although over-expression of the 
heterologous gene is not generally required for the invention to function. 

To express heterologous proteins according to the invention, a promoter will be activated (eg. by 
addition of an inducer, or by removal of a repressor). While the expressed extra-chromosomal genes 
»n a cell of the invention must include the essential gene, therefore, the heterologous gene may be 
expressed or non-expressed depending on prevailing circumstances. 

Yeast engages its ubiquitination system to tag many proteins for degradation at the exit from Gl and 
m the later stages of Mphase. This tagging can interfere with the yield of some heterologous proteins 
m yeast, but can be prevented by arresting cells in early Gl or M phase. Cell cycle arrest can be 
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hieved in various ways, including the use of a factor or of cell cycle inhibitors such as nocadazole. 
Expression methods of the invention may thus involve the use of such reagents. 
During expression of the heterologous gene, a yeast may be in diploid or haploid form. 



Host cells 

5 Because all organisms have essential genes, and the invention is based on the fundamental principle 
of moving an essential gene from the chromosome onto an extra-chromosomal element so that 
transformants can be selected, the invention is applicable to all organisms, including prokaryotes and 
eukaryotes. In particular, the availability of plasmid shuffling protocols for many orgamsms 
facilitates the widespread use of the invention. Because bacterial expression systems are already 
10 well-developed, however, the invention's benefits are most immediately useful in eukaryotes, 
including unicellular eukaryotes (such as yeasts) and multicellular eukaryotes (such as annuals and 
plants). As the use of essential genes as markers avoids the need for antibiotics, however, the 
invention offers advantages over conventional systems in situations where even traces of antibiotics 
in the purified expression product cannot be tolerated. 
1 5 The invention is particularly useful for yeasts. Yeast is an inexpensive organism to work with, can be 
stored easily by freezing, and has an extensive historical background in expression and genetic 
manipulation, and with the sequencing of the S.cerevisiae genome, genomics and proteomics of this 
organism have been heavily exploited. Many suitable clones and vectors for expression and selection 
are readily available, and these have been extensively studied and characterised. Furthermore, studies 
20 of the yeast proteome have shown that yeasts are extremely tolerant to the expression of genes in the 
form of fusion proteins, without loss of solubility or function [43,44]. 

Preferred yeasts are those which support plasmids and, for assisting in the preparation of cells of the 
invention, which exist in haploid and diploid forms. Budding yeasts are particularly preferred. 
Yeasts include the following genera: Arthroascus, Arxiozyma, Bullera, Candida, Debaryomyces, 
25 Dekkera, Dipodascopsis, Endomyces, Eremothecium, Geotrichum, Hanseniaspora, Hansenula, 
Hormoascus, Issatchenkia, Kloeckera, Kluyveromyces, Lipomyces, Lodderomyces, Metschnikowia, 
Pachysolen, Pachytichospora, Pichia, Rhodosporidium, Rhodotorula, Saccharomyces, 
Saccharomycodes, Schizoblastosporion, Schizosaccharomyces, Schwaniomyces, Sporobolomyces, 
Sterigmatomyces, Sympodiomyces, Taphrina, Torula, Torulaspora, Torulopsis, Trichosporon, 

30 Yarrowia, Zygohansenula, and Zygosaccharomyces. Preferred genera for use with the invention are 
Saccharomyces, Schizosaccharomyces and Pichia. Common industrial yeast systems include 
Hansenula polymorpha, Kluyveromyces lactis, Yarrowia lipolytica, Saccharomyces carlsbergensis, 
Saccharomyces ellipsoideus and Candida utilis, and particularly preferred species for use with the 
invention are Saccharomyces cerevisiae (budding or bakers yeast) and Schizosaccharomyces pombe 

35 (fission yeast [45]). Such yeasts are readily available to the skilled person. 
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^Vlany E.coli strains optimised for recombinant protein expression are available e.g. BL21 a ^Lt s 
derivatives. 

The invention does not utilise wild-type cells as hosts, as the invention relies on the absence of an 
essential gene from the host's chromosome, with that absence being complemented by an 
extra-chromosomal copy of the gene. Thus the host's chromosome will be lacking a functional copy 
of an essential gene. Typically, therefore, the invention will use a host that has a knockout genotype 
for the essential gene in question. The knockout may remove or disrupt the whole or part of the 
chromosomal gene, in the regulatory region(s) and/or the coding region(s). Thus remnants of the 
essential gene may remain in the chromosome, but the overall effect will be that the host's 
chromosome cannot be transcribed and/or translated to produce the essential gene product in 
functional form. Knockout of essential genes is known in the prior art [e.g. 20,21] but 
complementation with extra-chromosomal copies of the genes has been used to study the essential 
gene itself rather than as a way of selecting for the presence of a different heterologous gene. 

Knockout by homologous recombination is a preferred method for obtaining suitable host cells, and 
in particular knockout by isogenic deletion. Replacement of a chromosomal gene with a marker gene 
is typical e.g. as a result of homologous recombination to insert an antibiotic resistance gene. Gene 
inactivation methods such as those disclosed in references 46 and 47 can easily be adapted by the 
inclusion of covering plasmids encoding an essential gene prior to the inactivation step. Other 
non-knockout methods of preventing expression of an essential protein include chromatin silencing, 
antisense and RNA silencing (e.g. RNAi) techniques, although such techniques are not preferred due' 
to their reversible nature and to the difficulty in ensuring that vector-derived genes are not also 
inactivated. A further way of eliminating the chromosomal gene's function is by mutagenesis of 
codons encoding critical amino acids e.g. a single Arg-522-His mutation in the sigA gene encoding 
o* in Mycobacterium smegmatis is lethal, without the need for knockout of the whole coding 
sequence [48]. Thus the skilled person can readily generate a host cell in which a chosen essential 
gene has been disabled, either by preventing its expression (either at a transcriptional or translational 
level) or by allowing its expression but in an inactive form. 

In addition to knockout of the essential gene, the host may include further mutations to remove 
undesirable phenotypes. These mutations may already be present in a starting yeast strain, or they 
30 may be introduced. 

For example, many host cells express endogenous proteases which degrade heterologous proteins, 
but which are not essential to viability under laboratory conditions. Deletion of such proteases from 
the host improves recombinant protein expression. Thus a cell of the invention may include knockout 
mutations of one or more endogenous proteases. In yeast, deletion of PEP4 function (the 
35 saccharopepsin aspartyl protease [49]) is a preferred mutation. Other proteases which can be knocked 
out include Prbl, Prcl and Cpsl. 
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host cell may have mutations in genes responsible to cell wall assembly, such that the cell wall is 
"akened in order to simplify post-expression processing of cells. Such mutations make cells more 
fragile, which may not be useftil in a general laboratory bench setting, but would be very useful in a 
specific expression system at an industrial scale where simplification of downstream processing » a 
5 higher priority than benchtop resilience. 

The host cell may have mutations to prevent slow growth e.g. deletion of clnS or cln2 in yeast. 
The host cell may also include heterologous genes encoding foreign proteins, such as those from 
non-native metabolic pathways. For example, heterologous glycosyltransferases and other 
glycosylation enzymes (e.g. mannosidases I and II, N-acetylglucosaminyl transferases I and II, 
10 uridine S'-diphosphate (UDP)-N-acetylglucosamine transporter, etc.) may be expressed m order to 
increase the glycosylation repertoire of an expression host [50], and in particular to mimic human 
glycosylation. Native pathways may be inhibited or knocked out to assist in this approach [51]. 

General 

The term "comprising" means "including" as well as "consisting" e.g. a composition "comprising" X 
1 5 may consist exclusively of X or may include something additional e.g. X + Y. 
The term "about" in relation to a numerical value x means, for example, x±10%. 
The word "substantially" does not exclude "completely" e.g. a composition which is "substantially 
free" from Y may be completely free from Y. Where necessary, the word "substantially" may be 
omitted from the definition of the invention. 

20 BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 illustrates the construction of starting strains for use with the invention, and figure 2 shows a 
further development of this process, starting with the strain produced at the end of figure 1. 
Figure 3 shows two maps of the pMGl plasmid, with figure 4 showing its polylinker site. 
Figure 5 shows expression from the pMGl plasmid using glucose (5A) or galactose (5B). 
25 Figure 6 shows the plasmid shuffling used in selecting cells of the invention. The yeast cell is shown 
progressing from starting cell to intermediate cell to a cell useful for heterologous expression of 
proteins according to the invention. 

Figures 7 to 10 show the results of protein expression according to the invention. The lanes were 
loaded with protein from ~30ml of culture. 
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^ MODES FOR CARRYING OUT THE INVENTION 
Construction of starting yeast strains 

Diploid S.cerevisiae strains that are heterozygous for MOB1 iMOBl/mobl::kan R ) are available. Such 
a strain was obtained and was transformed with a pURA3 plasmid ("pRS316" [52]) carrying a 
Bamm-EcdKS. PCR fragment encompassing the entire MOB1 coding sequence plus flanking 
regulatory elements [15]. This strain is gal2 (has sub-optimal growth on galactose as a sole carbon 
source) and is Ura~ (requires uracil in growth medium). Ura + transformants were selected and 
allowed to sporulate. After germination, haploid mobl::kan R strains were selected using G418. These 
cells have lost their chromosomal MOB1, but its activity is complemented by the MOBl + plasmid. 
These cells were mated with a second haploid strain ("CG379" [53]) which was MOB1 trpl GAL2 
and the mated diploid cells were then sporulated. Spores which were trpl GAL2 moblr.kan* (cannot 
grow without tryptophan, can grow on galactose, G418 resistant) were selected for G418 resistance 
and growth on galactose medium. One which was mating type a was designated MGY66 and had the 
following relevant genotype MATa mobl::kan R trpl GAL ura3 pURA3-MOBl. MGY66 is a suitable 
1 5 starting cell for use with the invention, and its overall construction is shown in Figure 1 . 

As a further development, shown in Figure 2, the PEP4 gene of this strain was knocked out and 
replaced with a LEU2 cassette [54]. The resulting strain is referred to as "MGY70" and is MATa 
mobl::kan R trpl GAL pep4::LEU2 ura3 pURA3-MOBl. The PEP4 gene encodes an aspartyl 
protease ("saccharopepsin") which can degrade recombinantly-expressed proteins, but which is not 
essential for cell survival, and so its deletion can improve yields of stable recombinant proteins. 

Preparation of expression plasmids 

Starting with plasmid pESC-URA (Invitrogen™), a Pvu j fragment was excised, which contains the 
divergent, conditional and galactose-inducible yeast Gall-10 promoters and yeast ADH and CYC1 
terminators. This fragment was used to replace a Pvul fragment of pRS424 [55] to give "pESC-424". 

An EcoRl-Spel fragment encompassing the MOB1 coding sequence was made by PCR of yeast 
genomic DNA using the following primers: 

Fwd, with EcoRI site: CCCGAATTCATGTCTTTTCTACAAAAT 

Rev, with Spel site: CCCACTAGTCTACCTATCCCTCAACTCC 

The PCR fragment was cloned into the GAL10 promoter of pESC-424 to give pESC-424-MQ57. The 
same EcoKL site was then removed by infilling with Klenow DNA polymerase, to give "pESC-424- 
MOBl-AEcoRl". Removal of this EcoRI site allowed a unique EcoRl site to be later included in a 
polylinker. 

A Bga-Xhol fragment containing a GST coding sequence, a thrombin cleavage site and a polylinker 
was made by PCR of pGEX-KG [56] and cloned between Bamm and Xhol sites of pESC-424- 
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rOBl-AEcom, to give the plasmid "pMGl" (Figures 3A & 3B). The polylinker site (Figure 4) can 
receive genes encoding proteins of interest for expression as GST-fusions. 

Transformation to express recombinant proteins (Figure 6) 

Plasmid pMGl is grown in Kcoli and a plasmid DNA miniprep is prepared. Separately, a gene 
encoding a heterologous protein of interest is prepared which, after restriction enzyme treatment, wxll 
have sticky ends that are compatible and in-frame with the polylinker site in pMGl. The two 
molecules are digested and ligated to give a plasmid encoding the protein of interest in the form of a 
GST-fusion protein. This plasmid ("pMGl-X") is transferred into MGY70 yeast by the lithium 
acetate protocol, and is then selected on a minimal medium lacking tryptophan. As MGY70 is trpl, 
only transformants survive. Next, the cells are grown on agar with uracil and Img/ml 5-fluororot.c 
acid, which selects against URA3 + cells. Surviving cells are those which have lost the pURA3-MOBl 
plasmid, but which have retained pMGl-X as the sole source of MOB 1. 

The final transformants can be grown in rich media (e.g. in YEP medium) without further selection. 
The cells require uracil to grow, but this is supplied by rich media. The cells can be frozen at th.s 
stage to provide long-term stocks e.g. freezing at-80°C in YEP medium with 20% glycerol. 
Expression of the heterologous fusion protein can be induced by switching on the pGAL promoters. 
Protein expression and purification 

Yeast cells of the invention contains a heterologous gene under the control of a pGAL promoter. The 
MOB1 is also under the control of a pGAL promoter. This arrangement allows a very high copy 
number of the pMG plasmid to be achieved prior to expression of the heterologous gene, thereby 
giving high expression levels. Furthermore, by keeping the heterologous gene in an "off state at this 
stage then any possible toxic effects of the heterologous gene are avoided. 

Cells need MOB1 expression to survive. As the MOB1 gene is under the control of a pGAL 
promoter, which is repressed when cells are grown on glucose, it would seem on paper that the cells 
would die when grown on glucose. As repression is not 100% efficient, however, there is a low-level 
basal expression from the pGAL promoters (Figure 5A). This basal expression provides low levels of 
MOB1 to the growing cells, allowing survival. Moreover, the absolute need for MOB1 operates as a 
selection pressure to increase the copy number of pMGl. In the presence of glucose, therefore, the 
copy number of pMGl increases to high levels. 

When expression of the heterologous protein is desired, the cells are transferred to a galactose 
medium. The absence of glucose and presence of galactose removes repression of the pGAL 
promoters and expression of the heterologous protein is thus induced (Figure 5B). Furthermore, the 
recombinant gene is expressed at even higher levels because of the high copy number resulting from 
the pGAL-controlled MOB1 selection. 
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0After induction, cells are grown and then harvested. The cell lysate is applied to a glutath^ 
column, which retains the GST-fusion protein. After washing, thrombin is added to the col" 
leading to elution of the cleaved heterologous protein in pure form. 

Expression of murine TPL2 

This transformation/expression/purification process was followed for murine TPL2 protein. 

A pCDNAS vector carrying the cDNA of the complete mouse TPL2 coding sequence was used as a 
PCR template to generate a DNA fragment suitable for cloning into pMGYl. The PCR forwards 
primer included the first 18 coding bases of TPL2 preceded by a synthetic BamHl site. The BamlHl 
site was designed to so that the TPL2 sequence was in frame with the 3' end of the GST sequence of 
pMGl. The reverse primer had the last 18 bases of the negative strand in reverse 5'-3' orientation 
preceded by a synthetic Xhol site. The PCR product was prepared for digestion using the Wizard 
PCR Preps DNA Purification System. The PCR fragment and pMGl were digested with BamUl and 
Xhol restriction enzymes. The PCR fragment was again purified using the Wizard PCR Preps DNA 
Purification System. The digested vector was electrophoresed through a 10% agarose TAE buffered 
gel. Linear plasmid was excised from the gel and purified from the agarose using a Geneclean Kit. 
Vector and PCR fragments were ligated together by incubation together for 2h. Control ligations 
were done with no insert DNA. 

Ligation mixtures were transformed into E.coli DHlOb.Transformed E.coli were selected on L agar 
containing 20ug/ml ampicillin + 20ug/ml nafcillin. Individual clones were colony purified by 
restreaking on amp+naf selective medium. Miniprep DNA of individual clones was prepared using 
the W,zard Plus Minipreps DNA Purification System. Miniprep DNA was digested with BamHi + 
Xhol restriction enzymes to identify clones carrying the ~1 .6kb TPL2 coding sequence. 

The DNA of three potentially positive pMGl-TPL2 clones were transformed into S.cerevisiae 
MGY70 using the lithium acetate procedure. MGY70 transformants with this TRP1 plasmid were 
selected by growth at 30°C on minimal agar medium lacking tryptophan. Two individual 
transformant clones obtained from each miniprep DNA sample were colony purified by re-streaking 
on agar medium lacking tryptophan. A single colony from each of these plates was streaked onto 
minimal medium supplemented with 20ug/ml uracil and lmg/ml FOA. FOA plates were incubated 
for 2-3 days at 30°C. Single colonies were picked onto fresh FOA plates and grown for a further 2-3 
days. In these cells the covering plasmid in MGY70 that provided the essential MOB] gene had been 
replaced by the expression plasmid and its copy of MOB1. From this point onwards these cells could 
be grown on rich medium with no further conditional selection. 

Examples of the resulting single colonies were next tested for protein expression. However, at this 
stage it was useful to test whether expression of the cloned gene in toxic as this influences the 
induction regime for inducible gene expression. Induction of toxic gene products is indicated by 
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Ailure of the cells to grow on rich agar medium with 2% galactose as carbon source. Induction of the 
^potential TPL2 clones was not toxic as judged by this simple test. 

Three potential isolates originating from three independent ligation events were tested for expression 
of TPL2 50ml overnight cultures were grown at 30°C in rich, YEP, medium with 2% raffinose as 
carbon source. The cultures were inoculated so that cell density after overnight growth was 
approximately 5xl0 7 /ml. The overnight cultures were used to inoculate 500ml of YEP medium 
supplemented with 2% galactose as carbon source and grown for 6-8h at 30°C. Cells from 50ml and 
450ml of culture were harvested by centrifugation, frozen rapidly on dry ice and stored at -80°C. The 
small pellets were used to check for induced expression of TPL2 while the larger pellets were held in 
10 reserve for preparation of Tpl2 for experimental use. 

Small pellets were resuspended in 400ul of lysis buffer (50mM Tris-HCl pH 7.5, 250mMNaCl, 1% 
Nonidet P40, 10% glycerol, 4mM dithiothreitol, 200ug/ml sodium orthovanadate, lOmM NaF, 
50mM glycerol-2-phosphate, ImM PMSF, 'Complete' protease inhibitor (Roche™)). For cell lysis, 
glass beads, 0.5mm diameter, were added to the meniscus in 2ml screw cap tubes which were then 
shaken three times lOsec in a RiboLyser apparatus (Hybaid™). Cell lysate was recovered by piecing 
the base of the tube and followed by centrifugation inside a larger tube. Cell debris and insoluble 
material was removed by 2x15 min centrifugation at 13000 rpm in a refrigerated micro centrifuge. 
The cleared lysate was added to 50ul of glutathione sepharose beads which had been pre-equilibrated 
in 250mM NaCl, 50mM Tris-HCl pH 7.5, 0.2% Nonidet P40. The beads were gently mixed with the 
lysate on a rotor at 4°C for l-2h. The beads were washed 5x with 250mM NaCl, 50mM Tris-HCl pH 
7 5 0 2% Nonidet P40, 4mM dithiothreitol. Proteins bound to the glutathione sepharose beads were 
analysed by SDS-polyacylamide gel electrophoresis. Protein bands were visualised by staining with 
coomassie blue (Figure 10). 

Large cell pellets were resuspended in lysis buffer (approximately 10ml/l g cells). Cells were lysed 
25 with a French pressure cell operating at 20000psi. Cleared lysates were made by centrifugation at 
18000g for 2x20 min at 4°C. Large scale affinity purification of GST-TPL2 was essentially as 
described above except that appropriately increased amounts of reagents were used. 
In contrast to the successful expression of TPL2 using the system of the invention, attempts to 
express the protein in Rcoli using the P GEX-4t and P ET28 plasmids failed. The attempts used the 
30 full length protein as well as deletion derivatives lacking the N-terminal 30 residues and/or the 
C-terminal 70 residues (an oncogenic form). The kinase domain on its own was also tested. In all 
cases, however, any product which was seen (very little) was heavily degraded, inactive, insoluble or 
aggregated and was thus of limited use. 

Expression was also attempted without success using the Invitrogen™ DES system using the 
3 5 pMT/V5-His vector and S2 Drosophila cells. 
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j^Expression of other proteins 

Essentially similar procedures were used to produce GST-tagged S.cerevisiae Cdcl6, Bfal BUM 
Teml and three deletion derivatives of Clb6 that contain the cyclin box domain. With Bfal and the' 
Clb6 deletions, over-expression of the expressed proteins was toxic and reduced cell growth during 
the galactose induction period. To compensate for this, 500ml of overnight culture of these cells in 
YEP + 2% raffinose was used to inoculate a further 1 litre of YEP medium with a final concentration 
of 2% galactose. Induction then proceeded for 3-4h before harvesting. 

The MOBl expression system of the invention has been used to express full size Bfal (Figure 7) 
Bub2 (Figure 8), Ltel (Figure 9), Teml, kinase-dead Cdcl5, TPL2 (Figure 10), an oncogenic' 
C-terminally deleted TPL2, TPL2 deleted for 30 N-terminal residues, TPL2 deleted for both 30 N- 
termmal and 70 C-terminal residues, a kinase dead mutant of TPL2, the SARS virus Nspl3 putative 
mRNA cap-1 methyltransferase and three deletion derivatives of Clb6. All of these proteins have 
long histories of being difficult or impossible to produce in other systems but all of them give a 
GST-fusion product using the MOB 1 system of the invention. 

Ltel is a large yeast protein (>1400 amino acids). It could not be expressed as a GST-fusion protein 
using either the pGEX-KG Kcoli expression system or the pBacPak baculovirus system. In contrast 
expression using the MOBl system of the invention gave high-level expression of the fusion protein 
in soluble form (Figure 9). 

Teml is a small Ras-like GTP-binding protein in the regulatory cascade of the mitotic exit network 
[33,57]. Expression in E. coli was attempted with a variety of vectors: pGEX-KG (GST-fusion) and 
PET28 (hexahistidine tag) did not give useful expression although small quantities of MBP-Teml 
were obtained from a P MAL-c2X vector [33]. Expression of N-terminal fragments (amino acids 
1-228 or 1-190) and of a Q79L mutant were also tested in various E.coli vectors, with no success A 
hexalnstidine fusion was tested without success in P.pastoris using the pPICZB vector, and the 
PBakPakS GST-fusion system also failed in baculovirus. In contrast, expression using the MOBl 
system of the invention gave high-level expression of the GST-Tern 1 fusion protein in soluble form. 
Bub2 is part of a GTPase-activating protein complex involved in the mitotic exit network [33] 
Expression of Bub2 was attempted in Ecoli using the vectors pGEX-KG, P MAL-c2X and pET28 but 
only the GST-fusion was expressed and this was with large amounts of E. coli GroEL chaperone 
protein. Expression of fragments (amino acids 36-258) and of a GST-Bub2-His 6 protein were also 
tested m vanous E.coli vectors, with no success. The pPICZaA vector failed in P.pastoris, as did the 
PBakPakS and pBAC4X vectors in baculovirus. In contrast, expression using the MOBl system of 
the mvention gave high-level expression of the GST-Bub2 fusion protein in soluble form (Figure 8). 
Bfal is the other half of the GTPase-activating protein complex (Bfal/Bub2) [33]. Expression of 
Bfal was attempted in E.coli using the vectors pGEX-KG, pGEX-His and pMAL-2c. Only MBP- 
fusion proteins could be expressed successfully. The pPICZB vector failed in P.pastoris, as did the 
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ABakPakS vector in baculovirus. In contrast, expression using the MOB1 system of the invention 
^Vve high-level expression of the GST-Bfa fusion protein in soluble form (Figure 7). GST-Nspl3 
expressed from pGEX-6P-2 in E. coli was insoluble but soluble GST-Nspl3 was obtained using the 
MOB1 system. After cleavage of the fusion protein with human rhinovirus protease (PreScission 
5 Protease) yields were approximately 1 mg Nspl 3 /litre of induced cells. 

Hetero-oligomers 

Yeasts are made with chromosomal deletions of both MOB1 and CDC33. To complement the 
deletions, yeast are kept alive by a 'covering' plasmid expressing both MOB1 and CDC33 and 
carrying a URA3 selective marker. To insert the heterologous gene products, one plasmid is pMGl as 

10 described above and the other is a similar plasmid where (a) MOB1 is replaced by CDC33 and 
(b) conditional selective marker HIS3 replaces TRP1. To allow separate purification, the second 
plasmid uses an epitope tag, a hexahistidinyl tag or no tag rather than a GST fusion. 
Heterologous sequences are cloned into the two expression plasmids. The two plasmids are 
co-transformed into a yeast host, selecting for Trp + and His + prototrophy. Cells that have lost the 

15 URA3-covering plasmid are selected on FOA to give a cell capable of expressing two different 
proteins. 

In related work, GST-Mobl was expressed with untagged Dbf2 in ^/-deleted cells. Dbf2 is a 
kinase and Mobl is an accessory protein required for activity. The divergent GAL1-10 promoter 
expressed GST-Mobl in one direction and untagged Dbf2 in the other. Purification of GST-Mobl on 
20 glutathione sepharose also yielded approximately equimolar amounts of untagged Dbf2, 
demonstrating how hetero-oligomers can be purified. 

Expression in Escherichia coli 

An Ecoli BL21 derivative with good induction and protein stability characteristics is selected. 
An essential gene for chromosomal deletion is chosen. 
25 A covering plasmid based on pACYCl 84 is prepared, including: (a) the essential gene, prepared by 
PCR from E.coli genomic DNA and including its natural promoter and regulatory sequences; (b) the 
conditionally-lethal sacB marker to allow counterselection during confirmation of chromosomal 
deletion and during plasmid shuffling; (c) a P15A replication origin; (d) a chloramphenicol selection 
marker. The plasmid is transformed into E.coli in preparation for deletion of the essential 

30 chromosomal gene. 

After introduction of the covering plasmid, the chromosomal copy of the essential gene is replaced 
with a drug resistance marker using the methods described in reference 46 or 47. The drug resistance 
marker allows inheritance of the modified gene to be followed. Confirmation that the essential gene 
is provided by the covering plasmid and not by the chromosome can be provided by attempting to 

35 grow a bacterium in sucrose-based medium. 
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0An expression plasmid based on pETDuet (Novagen™) is prepared, including: (a) the essential A- 
(b) a mammalian, viral or other eukaryotic gene of interest; (c) two multiple cloning sites adjacentto 
tandem T71ac inducible promoters, with one MCS including a hexa-His tag; (d) a colEl replication 
ongm, which is compatible with the P15A origin used in the covering plasmid; and (e) an ampR 
gene, which allows the plasmid to be distinguished from the covering plasmid. The two genes (a) and 
(b) are under the control of the two T71ac promoters. A simpler system uses a normal pET or pGEX 
vector, with only a single MCS for receiving the mammalian gene; the essential gene with its own 
promoter is first cloned into a non-MCS site. 

The expression plasmid is transformed into the E.coli to give a bacterium carrying both the covering 
plasmid and the expression plasmid. 

Loss of the covering plasmid is then selected by growing bacteria on sucrose. This growth stage can 
be preceded by a period of growth in the absence of chloramphenicol, in order to provide an 
opportumty for 'natural' loss of the covering plasmid. After the sucrose counterselection, loss of the 
covering plasmid is confirmed by checking for chloramphenicol sensitivity. After this confirmation 
there ,s no need for further use of antibiotics during growth as the expression plasmid can be 
maintained by its providing the essential gene rather than by its ampR gene. The bacteria can thus be 
grown through several cultures in order to eliminate any trace of chloramphenicol, thereby giving an 
antibiotic-free preparation of bacteria which can be used to express the mammalian protein without 
antibiotic contamination. 

Bacteria are cultured and then induced under standard condition using IPTG. The mammalian protein 
is expressed as a GST fusion protein which is then purified using the appropriate affinity column 
The natrve protein is released using thrombin cleavage to give a final purified product. 

In a further development, the expression plasmid includes the oriV/TrfA replicon system for copy 
number amplification, as disclosed in reference [58]. 

It will be understood that the invention has been described by way of example only and modifications 
may be made whilst remaining within the scope and spirit of the invention. 
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,AIMS 

1. A cell that expresses both chromosomal genes and extra-chromosomal genes, wherein (a) the 
expressed extra-chromosomal genes include a gene with an essential function, the expression of 
which is unconditionally required for survival of the cell, (b) the expressed chromosomal genes do 
not provide that essential function, and (c) the extra-chromosomal genes include a heterologous 
gene, the expression of which is controlled by a promoter that is functional in the cell. 

2. A method for expressing a heterologous gene, comprising the step of growing the cell of claim 1 
in a culture medium. 

3. A method for purifying a protein, comprising the steps of: (a) growing the cell of claim 1 by the 
method of claim 2, such that it expresses said protein; and (b) purifying the protein. 

4. The method of claim 3, further comprising the step of: (c) treating the protein with a protease to 
provide a cleavage product of interest. 

5. A cell that expresses both chromosomal genes and extra-chromosomal genes, wherein (a) the 
expressed extra-chromosomal genes include a gene with an essential function, the expression of 
which is unconditionally required for survival of the cell, (b) the expressed chromosomal genes 
do not provide that essential function, and (c) the extra-chromosomal genes include a 
conditionally-lethal gene, wherein the essential gene is MOB1, Cdc33 or HsplO. 

6. A cell that expresses chromosomal genes, a first set of extra-chromosomal genes and a second set 
of extra-chromosomal genes, wherein (a) the expressed first and second sets of extra- 
chromosomal genes both include a gene with the same essential function, the expression of 
which is unconditionally required for survival of the cell, (b) the expressed chromosomal genes 
do not provide that essential function, (c) the first set of extra-chromosomal genes includes a 
conditionally-lethal gene, and (d) the second set of extra-chromosomal genes includes both a 
conditionally-required gene and a heterologous gene. 

7. An extra-chromosomal vector, comprising: (a) an essential gene whose expression is 
unconditionally required for survival of a cell of interest; (b) a conditionally-required gene to 
allow selection of host cells which include the extra-chromosomal vector; and (c) a gene 
encoding a heterologous protein of interest operably linked to a promoter that is functional in the 
cell of interest. 

8. The vector of claim 7, wherein the vector is a plasmid. 

9. The vector of claim 7 or claim 8, wherein the conditionally-required gene is a resistance gene. 

10. The vector of claim 9, wherein the resistance gene is an antibiotic resistance gene, a drug 
resistance gene, or a herbicide resistance gene. 
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011. The vector of claim 7 or claim 8, wherein the conditionally-required gene complement 
auxotrophic mutation in the host's chromosome. flP 

12. An extra-chromosomal vector, comprising: (a) an essential gene whose expression is 

unconditionally required for survival of a cell of interest; (b) a conditionally-lethal gene to allow 

5 selective killing of host cells which include the extra-chromosomal vector, wherein the essential 

gene is MOB1, Cdc33 or HsplO. 

13. The vector of any one of claims 7 to 12, comprising one or more of the following elements: (i) an 
origin of replication functional in a host cell of interest; (ii) a polylinker containing a plurality of 
restriction sites; (iii) a transcription termination sequence downstream of one or more of the 

1 0 promoters and their coding sequences in the vector. 

14. The vector of any one of claims 7 to 13, comprising one or more of: (iii) an origin of replication 
functional in bacteria; and (iv) an antibiotic resistance marker suitable for selection of bacterial 
transformants. 

15. A method for preparing the cell of claim 1, comprising the steps of: (a) obtaining the cell of 
claim 5, which includes a conditionally-lethal gene; (b) transforming the cell with the vector of 
any one of claims 7, 8, 9, 10, 11, 13 or 14, which includes a conditionally-required gene, to give 
the cell of claim 6; (c) selecting transformants which express the vector's conditionally-required 
gene; and (d) selecting transformants which lose the conditionally-lethal gene. 

16. The cell, method or vector of any preceding claim, wherein the essential gene is a gene whose 
loss prevents cell division, prevents mitosis, prevents transcription, or prevents translation. 

17. The cell, method or vector of any preceding claim, wherein the essential gene has a coding 
sequence of <3 000 base pairs. 

18. The cell, method or vector of any preceding claim, wherein the essential gene is not lethal when 
hyper-expressed. 

19. The cell, method or vector of any preceding claim, wherein the essential gene is MOB1. 

20. The cell, method or vector of any preceding claim, wherein the heterologous gene comprises a 
sequence from a higher eukaryote or a eukaryotic virus. 

21 . The cell, method or vector of claim 20, wherein the eukaryote is an animal. 

22. The cell, method or vector of any preceding claim, wherein the heterologous gene encodes a 
fusion protein comprising a first sequence and a second sequence. 

23. The cell, method or vector of claim 22, wherein the junction between the first sequence and 
second sequence includes a protease recognition sequence. 
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«4. The cell, method or vector of claim 23, wherein the protease is thrombin, factor Xa protease, 
enterokinase, endopeptidase rTEV or human rhinovirus protease 3C. 

25. The cell, method or vector of claim 22, wherein the junction between the first sequence and 
second sequence includes an intein. 

26. The cell, method or vector of any preceding claim, wherein the heterologous gene comprises a 
sequence encoding glutathione-S-transferase, a poly-histidine tag, a calmodulin-binding peptide, 
a maltose-binding protein, a chitin-binding domain, or an immunoaffinity epitope. 

27. The cell, method or vector of any preceding claim, wherein the heterologous gene encodes a 
protein which forms oligomers. 

28. The cell, method or vector of any preceding claim, wherein the heterologous gene is expressed as 
a soluble protein. 

29. The cell, method or vector of any preceding claim, wherein expression of the essential gene is 
controlled by an inducible promoter. 

30. The cell, method or vector of any preceding claim, wherein expression of the heterologous gene 
15 is controlled by an inducible promoter. 

31. The cell, method or vector of claim 29 or claim 30, wherein the promoter is a repressible 
promoter. 

32. The cell, method or vector of claim 31, wherein the heterologous gene and the essential gene are 
inducible and/or repressible by the same stimulus. 

20 33. The cell, method or vector of any preceding claim, wherein expression of the essential gene 
and/or the heterologous gene is controlled by a galactokinase/UDP-glucose 4 epimerase promoter. 

34. The cell, method or vector of any preceding claim, wherein the cell is a eukaryote. 

35. The cell, method or vector of claim 34, wherein the eukaryote is a yeast. 

36. The cell, method or vector of claim 35, wherein the yeast is Saccharomyces cerevisiae or 
25 Schizosaccharomyces pombe. 

37 The cell, method of vector of any preceding claim, wherein the heterologous gene encodes a Ltel 
protein, a Bfal protein, a Bub2 protein, a CDC5 protein, a CDC 15 protein, a CDC28 protein, a 
Tpl2 protein, a SARS virus Nspl3 protein, or a mRNA Capl methyl transferase, protein. 
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F ABSTRACT 

Selection markers in prior art systems are based on resistance genes or on complementatioTof 
auxotrophic mutations. The requirement for expression of these markers is conditional e.g. on the 
presence of an antibiotic, or on the absence of a nutrient. In contrast, the selection markers used in 
this invention are non-conditional, and selection pressure is absolute. The markers involved are genes 
which encode essential survival factors, such that loss of the marker gene is lethal. Thus the 
invention provides a cell which expresses chromosomal genes and extra-chromosomal genes 
wherein (a) the expressed extra-chromosomal genes include an essential gene whose expression is 
unconditionally required for survival of the cell, (b) the expressed chromosomal genes do not include 
said essential gene, and (c) the extra-chromosomal genes include a heterologous gene. The cells can 
conveniently be obtained by a plasmid shuffling procedure. 
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